338 research outputs found

    Fast and Accurate Algorithm for Eye Localization for Gaze Tracking in Low Resolution Images

    Full text link
    Iris centre localization in low-resolution visible images is a challenging problem in computer vision community due to noise, shadows, occlusions, pose variations, eye blinks, etc. This paper proposes an efficient method for determining iris centre in low-resolution images in the visible spectrum. Even low-cost consumer-grade webcams can be used for gaze tracking without any additional hardware. A two-stage algorithm is proposed for iris centre localization. The proposed method uses geometrical characteristics of the eye. In the first stage, a fast convolution based approach is used for obtaining the coarse location of iris centre (IC). The IC location is further refined in the second stage using boundary tracing and ellipse fitting. The algorithm has been evaluated in public databases like BioID, Gi4E and is found to outperform the state of the art methods.Comment: 12 pages, 10 figures, IET Computer Vision, 201

    Unsupervised Intuitive Physics from Visual Observations

    Full text link
    While learning models of intuitive physics is an increasingly active area of research, current approaches still fall short of natural intelligences in one important regard: they require external supervision, such as explicit access to physical states, at training and sometimes even at test times. Some authors have relaxed such requirements by supplementing the model with an handcrafted physical simulator. Still, the resulting methods are unable to automatically learn new complex environments and to understand physical interactions within them. In this work, we demonstrated for the first time learning such predictors directly from raw visual observations and without relying on simulators. We do so in two steps: first, we learn to track mechanically-salient objects in videos using causality and equivariance, two unsupervised learning principles that do not require auto-encoding. Second, we demonstrate that the extracted positions are sufficient to successfully train visual motion predictors that can take the underlying environment into account. We validate our predictors on synthetic datasets; then, we introduce a new dataset, ROLL4REAL, consisting of real objects rolling on complex terrains (pool table, elliptical bowl, and random height-field). We show that in all such cases it is possible to learn reliable extrapolators of the object trajectories from raw videos alone, without any form of external supervision and with no more prior knowledge than the choice of a convolutional neural network architecture

    A comparison of model validation techniques for audio-visual speech recognition

    Get PDF
    This paper implements and compares the performance of a number of techniques proposed for improving the accuracy of Automatic Speech Recognition (ASR) systems. As ASR that uses only speech can be contaminated by environmental noise, in some applications it may improve performance to employ Audio-Visual Speech Recognition (AVSR), in which recognition uses both audio information and mouth movements obtained from a video recording of the speaker’s face region. In this paper, model validation techniques, namely the holdout method, leave-one-out cross validation and bootstrap validation, are implemented to validate the performance of an AVSR system as well as to provide a comparison of the performance of the validation techniques themselves. A new speech data corpus is used, namely the Loughborough University Audio-Visual (LUNA-V) dataset that contains 10 speakers with five sets of samples uttered by each speaker. The database is divided into training and testing sets and processed in manners suitable for the validation techniques under investigation. The performance is evaluated using a range of different signal-to-noise ratio values using a variety of noise types obtained from the NOISEX-92 dataset

    Recognition of 3-D Objects from Multiple 2-D Views by a Self-Organizing Neural Architecture

    Full text link
    The recognition of 3-D objects from sequences of their 2-D views is modeled by a neural architecture, called VIEWNET that uses View Information Encoded With NETworks. VIEWNET illustrates how several types of noise and varialbility in image data can be progressively removed while incornplcte image features are restored and invariant features are discovered using an appropriately designed cascade of processing stages. VIEWNET first processes 2-D views of 3-D objects using the CORT-X 2 filter, which discounts the illuminant, regularizes and completes figural boundaries, and removes noise from the images. Boundary regularization and cornpletion are achieved by the same mechanisms that suppress image noise. A log-polar transform is taken with respect to the centroid of the resulting figure and then re-centered to achieve 2-D scale and rotation invariance. The invariant images are coarse coded to further reduce noise, reduce foreshortening effects, and increase generalization. These compressed codes are input into a supervised learning system based on the fuzzy ARTMAP algorithm. Recognition categories of 2-D views are learned before evidence from sequences of 2-D view categories is accumulated to improve object recognition. Recognition is studied with noisy and clean images using slow and fast learning. VIEWNET is demonstrated on an MIT Lincoln Laboratory database of 2-D views of jet aircraft with and without additive noise. A recognition rate of 90% is achieved with one 2-D view category and of 98.5% correct with three 2-D view categories.National Science Foundation (IRI 90-24877); Office of Naval Research (N00014-91-J-1309, N00014-91-J-4100, N00014-92-J-0499); Air Force Office of Scientific Research (F9620-92-J-0499, 90-0083

    Owl and Lizard: Patterns of Head Pose and Eye Pose in Driver Gaze Classification

    Full text link
    Accurate, robust, inexpensive gaze tracking in the car can help keep a driver safe by facilitating the more effective study of how to improve (1) vehicle interfaces and (2) the design of future Advanced Driver Assistance Systems. In this paper, we estimate head pose and eye pose from monocular video using methods developed extensively in prior work and ask two new interesting questions. First, how much better can we classify driver gaze using head and eye pose versus just using head pose? Second, are there individual-specific gaze strategies that strongly correlate with how much gaze classification improves with the addition of eye pose information? We answer these questions by evaluating data drawn from an on-road study of 40 drivers. The main insight of the paper is conveyed through the analogy of an "owl" and "lizard" which describes the degree to which the eyes and the head move when shifting gaze. When the head moves a lot ("owl"), not much classification improvement is attained by estimating eye pose on top of head pose. On the other hand, when the head stays still and only the eyes move ("lizard"), classification accuracy increases significantly from adding in eye pose. We characterize how that accuracy varies between people, gaze strategies, and gaze regions.Comment: Accepted for Publication in IET Computer Vision. arXiv admin note: text overlap with arXiv:1507.0476

    Computationally efficient solutions for tracking people with a mobile robot: an experimental evaluation of Bayesian filters

    Get PDF
    Modern service robots will soon become an essential part of modern society. As they have to move and act in human environments, it is essential for them to be provided with a fast and reliable tracking system that localizes people in the neighbourhood. It is therefore important to select the most appropriate filter to estimate the position of these persons. This paper presents three efficient implementations of multisensor-human tracking based on different Bayesian estimators: Extended Kalman Filter (EKF), Unscented Kalman Filter (UKF) and Sampling Importance Resampling (SIR) particle filter. The system implemented on a mobile robot is explained, introducing the methods used to detect and estimate the position of multiple people. Then, the solutions based on the three filters are discussed in detail. Several real experiments are conducted to evaluate their performance, which is compared in terms of accuracy, robustness and execution time of the estimation. The results show that a solution based on the UKF can perform as good as particle filters and can be often a better choice when computational efficiency is a key issue

    A semi‐active human digital twin model for detecting severity of carotid stenoses from head vibration—A coupled computational mechanics and computer vision method

    Get PDF
    In this work we propose a methodology to detect the severity of carotid stenosis from a video of a human face with the help of a coupled blood flow and head vibration model. This semi‐active digital twin model is an attempt to link non‐invasive video of a patient face to the percentage of carotid occlusion. The pulsatile nature of blood flow through the carotid arteries induces a subtle head vibration. This vibration is a potential indicator of carotid stenosis severity and it is exploited in the present study. A head vibration model has been proposed in the present work that is linked to the forces generated by blood flow with or without occlusion. The model is used to generate a large number of virtual head vibration data for different degrees of occlusion. In order to determine the in vivo head vibration, a computer vision algorithm is adopted to use human face videos. The in vivo vibrations are compared against the virtual vibration data generated from the coupled computational blood flow/vibration model. A comparison of the in vivo vibration is made against the virtual data to find the best fit between in vivo and virtual data. The preliminary results on healthy subjects and a patient clearly indicate that the model is accurate and it possesses the potential for detecting approximate severity of carotid artery stenoses

    Enabling Hyper-Personalisation: Automated Ad Creative Generation and Ranking for Fashion e-Commerce

    Full text link
    Homepage is the first touch point in the customer's journey and is one of the prominent channels of revenue for many e-commerce companies. A user's attention is mostly captured by homepage banner images (also called Ads/Creatives). The set of banners shown and their design, influence the customer's interest and plays a key role in optimizing the click through rates of the banners. Presently, massive and repetitive effort is put in, to manually create aesthetically pleasing banner images. Due to the large amount of time and effort involved in this process, only a small set of banners are made live at any point. This reduces the number of banners created as well as the degree of personalization that can be achieved. This paper thus presents a method to generate creatives automatically on a large scale in a short duration. The availability of diverse banners generated helps in improving personalization as they can cater to the taste of larger audience. The focus of our paper is on generating wide variety of homepage banners that can be made as an input for user level personalization engine. Following are the main contributions of this paper: 1) We introduce and explain the need for large scale banner generation for e-commerce 2) We present on how we utilize existing deep learning based detectors which can automatically annotate the required objects/tags from the image. 3) We also propose a Genetic Algorithm based method to generate an optimal banner layout for the given image content, input components and other design constraints. 4) Further, to aid the process of picking the right set of banners, we designed a ranking method and evaluated multiple models. All our experiments have been performed on data from Myntra (http://www.myntra.com), one of the top fashion e-commerce players in India.Comment: Workshop on Recommender Systems in Fashion, 13th ACM Conference on Recommender Systems, 201

    Design and development of a low-cost mask-type eye tracker to collect quality fixation measurements in the sport domain

    Get PDF
    The aim of the study was to build a low-cost mask-type eye tracker with accuracy and precision levels similar to those reported for commercial eye tracking devices. To this end, head-mounted hardware was designed and developed, while open-source software was modified for digital image capture, manipulation, and fixation analysis. An image recognition application was also included with different lighting scenarios. Moreover, parallax and viewing perspective errors were controlled to ensure the quality of data collection. The device was wireless and lightweight (99 g) to allow for natural movement and avoid participant discomfort. After calibration of a 9-target monocular grid, spatial accuracy and precision of the eye tracker was evaluated by 30 participants, at four different lighting setups, both before and after a climbing task. Validity tests showed high levels of accuracy in all conditions as evidenced by a systematic error for a 13-target grid of <0.5°. The reliability tests also showed consistent measurements with no differences in accuracy recorded between participants, lighting conditions, and visual behaviors for the pre- versus post-climbing task. These results suggest that the present eye tracker reports spatial accuracy similar to other commercial systems with levels of high quality. Altogether, this innovative user interface is suitable for research purposes and/or performance analysis in physical activity and sport-related activities. Also, features of this mask-type eye tracking system make it a suitable perceptual user interface to investigate human–computer interactions in a large number of other research fields including psychology, education, marketing, transportation, and medicine
    corecore